37 research outputs found

    FPGA Implementation of Spectral Subtraction for In-Car Speech Enhancement and Recognition

    Get PDF
    The use of speech recognition in noisy environments requires the use of speech enhancement algorithms in order to improve recognition performance. Deploying these enhancement techniques requires significant engineering to ensure algorithms are realisable in electronic hardware. This paper describes the design decisions and process to port the popular spectral subtraction algorithm to a Virtex-4 field-programmable gate array (FPGA) device. Resource analysis shows the final design uses only 13% of the total available FPGA resources. Waveforms and spectrograms presented support the validity of the proposed FPGA design

    The Australian English speech corpus for in-car speech processing

    Get PDF
    The Australian In-Car Speech Corpus is a multi-channel recording of a series of prompts from an in-car navigation task collected over a range of speakers in a variety of driving conditions. Its purpose is to provide a significant resource of speech data appropriate for investigating speech processing needs in the adverse environment of a car. Utterances spoken by 50 speakers were collected in seven different driving conditions, providing the foundation for investigation into noisy, speaker-independent speech processing. Speech recognition experiments are performed to validate the data, to provide baseline results for in-car speech recognition research, and to show that this data can improve speech recognition performance under adverse in-car conditions for Australian English when adapting from American English acoustic models

    Robust speech recognition using speech enhancement

    Get PDF
    Automatic Speech Recognition (ASR) has matured into a technology which is becoming more common in our everyday lives, and is emerging as a necessity to minimise driver distraction when operating in-car systems such as navigation and infotainment. In “noise-free” environments, word recognition performance of these systems has been shown to approach 100%, however this performance degrades rapidly as the level of background noise is increased. Speech enhancement is a popular method for making ASR systems more ro- bust. Single-channel spectral subtraction was originally designed to improve hu- man speech intelligibility and many attempts have been made to optimise this algorithm in terms of signal-based metrics such as maximised Signal-to-Noise Ratio (SNR) or minimised speech distortion. Such metrics are used to assess en- hancement performance for intelligibility not speech recognition, therefore mak- ing them sub-optimal ASR applications. This research investigates two methods for closely coupling subtractive-type enhancement algorithms with ASR: (a) a computationally-efficient Mel-filterbank noise subtraction technique based on likelihood-maximisation (LIMA), and (b) in- troducing phase spectrum information to enable spectral subtraction in the com- plex frequency domain. Likelihood-maximisation uses gradient-descent to optimise parameters of the enhancement algorithm to best fit the acoustic speech model given a word se- quence known a priori. Whilst this technique is shown to improve the ASR word accuracy performance, it is also identified to be particularly sensitive to non-noise mismatches between the training and testing data. Phase information has long been ignored in spectral subtraction as it is deemed to have little effect on human intelligibility. In this work it is shown that phase information is important in obtaining highly accurate estimates of clean speech magnitudes which are typically used in ASR feature extraction. Phase Estimation via Delay Projection is proposed based on the stationarity of sinusoidal signals, and demonstrates the potential to produce improvements in ASR word accuracy in a wide range of SNR. Throughout the dissertation, consideration is given to practical implemen- tation in vehicular environments which resulted in two novel contributions – a LIMA framework which takes advantage of the grounding procedure common to speech dialogue systems, and a resource-saving formulation of frequency-domain spectral subtraction for realisation in field-programmable gate array hardware. The techniques proposed in this dissertation were evaluated using the Aus- tralian English In-Car Speech Corpus which was collected as part of this work. This database is the first of its kind within Australia and captures real in-car speech of 50 native Australian speakers in seven driving conditions common to Australian environments

    A modified LIMA framework for spectral subtraction applied to in-car speech recognition

    Get PDF
    In noisy environments, speech recognition accuracy degrades significantly. Speech enhancement algorithms have been designed to overcome this, however solutions to date have not been optimal for speech recognition especially for non-stationary noise like that in a car. Recently, a likelihood-maximising (LIMA) criteria has been applied to speech enhancement techniques. This paper analyses the suitability of spectral subtraction for potential use under a modified version of this framework where direct access to and manipulation of speech recognition models is not available. Analysis shows spectral subtraction is suited to this holistic LIMA approach by confirming the cost surface is appropriate for gradient descent methods. It is also observed that there are regions on the cost surface where performance exceeds that achieved by parameter values traditionally selected for spectral subtraction

    Flexible airport terminal design: Towards a framework

    Get PDF
    Flexibility is a key driver of any successful design, specifically in highly unpredictable environment such as airport terminal. Ever growing aviation industry requires airport terminals to be planned and constructed in such a way that will allow flexibility for future design, alteration and redevelopment. The concept of flexibility in terminal design is a relatively new initiative, where existing rules or guidelines are not adequate to assist designers. A shift towards flexible design concept would allow terminal buildings to be designed to accommodate future changes and to make passengers’ journey as simple, timely and hassle free as possible. Currently available research indicates that a theoretical framework on flexible design approach for airport terminals would facilitate the future design process. The generic principles of flexibility are investigated in the current research to incorporate flexible design approaches within the process of an airport terminal design. A conceptual framework is proposed herein, which is expected to ascertain flexibility to current passenger terminal facilities within their corresponding locations as well as in future design and expansion

    The effect of dialect mismatch on likelihood-maximising speech enhancement for noise-robust speech recognition

    Get PDF
    Traditional speech enhancement methods optimise signal-level criteria such as signal-to-noise ratio, but these approaches are sub-optimal for noise-robust speech recognition. Likelihood-maximising (LIMA) frameworks are an alternative that optimise parameters of enhancement algorithms based on state sequences generated for utterances with known transcriptions. Previous reports of LIMA frameworks have shown significant promise for improving speech recognition accuracies under additive background noise for a range of speech enhancement techniques. In this paper we discuss the drawbacks of the LIMA approach when multiple layers of acoustic mismatch are present – namely background noise and speaker accent. Experimentation using LIMA-based Mel-filterbank noise subtraction on American and Australian English in-car speech databases supports this discussion, demonstrating that inferior speech recognition performance occurs when a second layer of mismatch is seen during evaluation

    A Continuous Speech Recognition Evaluation Protocol for the AVICAR Database

    Get PDF
    The use of speech recognition in automotive environments has received increased attention in recent times. Unfortunately, evaluations of algorithms designed to improve recognition performance in this environment have been performed on differing data collections, making results difficult to compare. In recent years, the University of Illinois released a large in-car audio and visual data collection known as AVICAR ("audio-visual speech in a car") [1]. The AVICAR database is freely available, but to date no uniform evaluation protocol on which to perform experiments has been reported. This paper introduces a speaker-independent, continuous speech recognition evaluation protocol for the audio data of the AVICAR database. It is designed to allow for model adaptation, evaluation and testing using native English speakers. Baseline recognition results obtained using this protocol are also presented

    Check-in processing: simulation of passengers with advanced traits

    Get PDF
    In order to tackle the growth of air travelers in airports worldwide, it is important to simulate and understand passenger flows to predict future capacity constraints and levels of service. We discuss the ability of agent-based models to understand complicated pedestrian movement in built environments. In this paper we propose advanced passenger traits to enable more detailed modelling of behaviors in terminal buildings, particularly in the departure hall around the check-in facilities. To demonstrate the concepts, we perform a series of passenger agent simulations in a virtual airport terminal. In doing so, we generate a spatial distribution of passengers within the departure hall to ancillary facilities such as cafes, information kiosks and phone booths as well as common check-in facilities, and observe the effects this has on passenger check-in and departure hall dwell times, and facility utilization

    Challenges in passenger terminal design: A conceptual model of passenger experience

    Get PDF
    In recent years, de-regulation in the airline industry and the introduction of low-cost carriers have conspired to produce significant changes in the airport landscape. From an airport operator’s perspective, one of the most notable has been the shift of capital revenue from traditional airline sources (through exclusive use, long term lease arrangements) to passengers (by way of fees collected from ticket sales). As a result of these developments, passengers have become recognized as major stakeholders who have the power to influence airport profitability. This link between passenger satisfaction and profitability has generated industry wide interest in the “passenger experience”. In this paper, we define the factors which influence passenger experience, namely (a) artifacts, (b) services and (c) the terminal building, and explore the challenges that exist in the current approaches to terminal design. On the basis of these insights, we propose a conceptual model of passenger experience, and motivate its use as a framework for further research into improving terminal design from a passenger oriented perspective

    The use of phase in complex spectrum subtraction for robust speech recognition

    Get PDF
    In this paper we propose a new method for utilising phase information by complementing it with traditional magnitude-only spectral subtraction speech enhancement through Complex Spectrum Subtraction (CSS). The proposed approach has the following advantages over traditional magnitude-only spectral subtraction: (a) it introduces complementary information to the enhancement algorithm; (b) it reduces the total number of algorithmic parameters, and; (c) is designed for improving clean speech magnitude spectra and is therefore suitable for both automatic speech recognition (ASR) and speech perception applications. Oracle-based ASR experiments verify this approach, showing an average of 20% relative word accuracy improvements when accurate estimates of the phase spectrum are available. Based on sinusoidal analysis and assuming stationarity between observations (which is shown to be better approximated as the frame rate is increased), this paper also proposes a novel method for acquiring the phase information called Phase Estimation via Delay Projection (PEDEP). Further oracle ASR experiments validate the potential for the proposed PEDEP technique in ideal conditions. Realistic implementation of CSS with PEDEP shows performance comparable to state of the art spectral subtraction techniques in a range of 15-20 dB signal-to-noise ratio environments. These results clearly demonstrate the potential for using phase spectra in spectral subtractive enhancement applications, and at the same time highlight the need for deriving more accurate phase estimates in a wider range of noise conditions
    corecore